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Abstract 

In  [10]  a  method  was  introduced  for  solving  Poisson's  or  the  bihar- 
monic equation  on  an  irregular  region  by  making  use  of  an  integral 
equation  formulation.  Because  fast  solvers  were  used  to  extend  the 
solution  to  an  enclosing  rectangle,  this  method  avoided  many  of  the 
standard  problems  associated  with  integral  equations.  The  equations 
that  arose  were  Fredholm  integral  equations  of  the  second  kind  with 
bounded  kernels.  In  this  paper  we  use  iterative  methods  to  solve  the 
dense  nonsymmetric  linear  systems  arising  from  the  integral  equations. 
Because  the  matrices  are  very  well-conditioned,  conjugate  gradient-like 
methods  can  be  used  and  will  converge  very  rapidly.  The  methods  are 
very  amenable  to  vectorization  and  parallelization,  and  we  describe 
parallel  and  vector  implementations  on  shared  memory  multiproces- 
sors. Numerical  experiments  are  described  and  results  presented  for  a 
3-dimensional  interface  problem  for  the  Laplacian  on  a  recording  head 
geometry. 


'Part  of  this  work  was  performed  while  visiting  the  Courant  Institute  of  Mathematical 
Sciences. 

'This  work  was  supported  by  the  Applied  Mathematical  Sciences  Program  of  the  U.S. 
Department  of  Energy  under  contract  DE-AC02-76ER03077. 


1      Introduction 

In  this  paper  we  present  efficient  parallel  numerical  methods  for  solving 
Poisson's  and  the  biharmonic  equations  on  general  regions.  Aside  from  be- 
ing parallel,  these  methods  also  vectorize  very  well.  This  is  in  contrast  to 
other  methods  for  solving  these  equations  on  general  irregular  regions,  no- 
tably finite  element  methods.  Finite  element  methods  may  require  use  of  a 
fast  scatter/gather  operation  in  order  to  efficiently  perform  a  matrix- vector 
multiply  for  the  large  sparse  matrix  that  arises.  In  addition,  many  of  the 
standard  preconditioners  used  in  the  iterative  solution  of  such  equations 
require  inherently  sequential  operations  (e.g.,  backsolving  with  the  incom- 
plete Cholesky  decomposition  [12]).  This  is  especially  true  in  the  case  of 
the  biharmonic  equation,  and  we  know  of  no  other  effective  parallel  and 
vectorizable  methods  for  solving  it. 

The  methods  presented  here  combine  integral  equation  formulations  of 
the  problems  with  rapid  finite  difference  methods  on  a  larger  rectangular 
region  in  which  the  irregular  region  is  embedded.  Both  the  integral  equation 
method  and  the  finite  difference  method  parallelize  and  vectorize  well.  By 
using  well-conditioned  integral  equation  formulations  and  iterative  methods 
for  solving  them,  the  main  part  of  the  computation  is  reduced  to  the  iterative 
solution  of  a  dense  nonsymmetric  matrix  equation,  instead  of  a  (much  larger) 
sparse  (although  symmetric)  system  of  equations  with  an  irregular  pattern 
of  nonzero  elements. 

The  method  used  is  very  similar  to  the  method  developed  in  [10].  The 
main  idea  is  to  use  the  integral  equation  formulation  to  define  a  discontinu- 
ous extension  of  the  solution  to  the  remainder  of  the  embedding  region.  All 
the  discontinuities  between  the  original  function  and  its  extension  can  be 
expressed  in  terms  of  the  solution  of  the  integral  equation.  These  disconti- 
nuities are  used  to  compute  an  approximation  to  the  discrete  Laplacian  of 
the  combined  function  at  mesh  points  near  the  original  boundary.  Fast  Pois- 
son  solvers  are  then  used  to  compute  the  extended  function.  This  method 
can  also  be  used  to  solve  directly  for  the  derivatives  of  the  solution,  which 
are  usually  the  physically  meaningful  quantities. 

This  combination  of  integral  equation  formulations  with  finite  difference 
methods  on  a  larger  rectangular  region  overcomes  two  of  the  standard  prob- 
lems usually  associated  with  integral  equation  methods.  First,  because  the 
solution  is  actually  computed  using  a  fast  Poisson  solver,  it  is  not  necessary 
to  compute  a  costly  integral  in  order  to  evaluate  the  solution  at  many  points. 
Second,  although  the  kernel  of  the  integral  is  singular,  and  hence  difficult  to 


evaluate  accurately  quadrature  near  the  boundary,  the  solution  can  be 
computed  very  accurately  using  the  fast  Poisson  solver.  In  addition,  since 
the  methods  make  use  of  integral  equation  formulations,  they  are  very  well- 
suited  to  problems  on  exterior  regions.  A  future  aim  is  to  use  these  methods 
for  solving  nonlinear  magnetostatic  problems  in  unbounded  domains. 

Work  described  in  the  current  paper  differs  from  that  in  [10]  in  several 
important  respects.  One  difference  is  that  we  use  a  different  integral  equa- 
tion formulation  of  the  biharmonic  equation,  which  has  probably  not  been 
applied  before  in  numerical  computations.  The  other  primary  difference  is 
the  way  in  which  the  integral  equations  are  solved.  Conjugate  gradient  type 
iterative  methods  (i.e.,  the  biconjugate  gradient  /conjugate  gradient  squared 
method  [5,18];  GMRES  [16];  and  the  conjugate  gradient  method  applied  to 
the  normal  equations  [6]),  with  simple  diagonal  preconditioned,  are  used  to 
solve  both  Laplace's  and  the  biharmonic  equation.  Since  most  of  the  op- 
erations in  these  methods  consist  of  matrix-vector  products,  and  since  the 
matrices  are  dense,  it  is  easy  to  see  how  the  calculations  can  be  performed 
in  parallel  and  each  of  the  parallel  sections  can  be  vectorized.  The  matrix 
can  be  divided  into  blocks,  and  each  processor  can  handle  multiplication 
by  that  block,  while  vectorization  can  be  performed  over  the  rows  within 
the  block.  This  can  actually  be  accomplished  using  a  level-2  Bias  routine 
[4],  and  these  routines  have  already  been  optimized  on  many  parallel/vector 
supercomputers. 

We  also  note  that  the  integral  equations  used  for  solving  both  of  these 
problems  are  well-conditioned  Fredholm  integral  equations  of  the  second 
kind,  and  so  the  matrices  have  eigenvalues  and  singular  values  which  are 
bounded  independent  of  the  number  of  discretization  points.  Therefore  we 
were  able  to  achieve  convergence  in  a  small  number  of  iterations.  In  partic- 
ular, we  were  able  to  solve  the  biharmonic  equation  on  general  nonconvex 
regions,  and,  except  on  very  eccentric  ellipses,  all  of  the  iterative  meth- 
ods tested  converged  in  fewer  than  30  iterations.  To  solve  the  biharmonic 
equation  on  an  ellipse  with  eccentricity  10  or  20,  required  no  more  than  32 
GMRES  iterations,  provided  enough  direction  vectors  were  saved. 

In  addition  to  using  iterative  methods  to  solve  the  integral  equations, 
we  also  solved  equations  with  different  types  of  boundary  conditions  than 
before.  Specifically,  we  solved  a  3-D  exterior  interface  problem  in  magne- 
tostatics  that  arises  from  modeling  a  recording  head,  and  a  2-D  Dirichlet 
problem  for  Laplace's  equation  on  a  doubly  connected  region. 

We  note  that  fast  Poisson  solvers  have  also  been  used  for  solving  Pois- 
son's  equation  on  irregular  regions  through  the  use  of  capacitance  matrix 


methods  [2,14].  Our  method  is,  however,  better  suited  to  problems  on  exte- 
rior regions.  Although,  by  using  a  trick  due  to  Hockney  [7]  and  improved  by 
James  [8],  it  is  possible  to  use  capacitance  matrix  methods  to  solve  exterior 
problems,  both  the  operation  count  and  the  storage  requirements  for  the 
fast  solver  are  significantly  increased.  More  important,  capacitance  matrix 
methods  do  not  allow  one  to  solve  for  derivatives  directly.  Moreover,  we 
know  of  no  implementation  of  a  capacitance  matrix  method  for  solving  the 
biharmonic  equation  on  an  irregular  region  or  for  solving  interface  problems. 
The  organization  of  the  paper  is  as  follows.  Section  2  presents  the  in- 
tegral equation  formulations,  and  Section  3  the  method  used  for  obtaining 
solutions  to  the  differential  equations,  given  the  solutions  of  the  integral 
equations.  Section  4  describes  results  of  numerical  experiments  carried  out 
on  the  8  processor  New  York  University  ultracomputer  prototype,  on  the 
Cray  X-MP,  and  on  the  IBM  3090. 

2      Integral  Equations 

2.1      Laplace's  Equation 

Laplace's  equation  in  two  dimensions  with  Dirichlet  boundary  data  g(t) 
prescribed  on  the  boundary  curve  (x(s),  y(s))  is  solved  in  terms  of  a  double 
layer  density  function.  In  this  formulation,  the  solution  u(z)  is  expressed  as 
the  integral  over  the  boundary  of  the  region  of  the  product  of  an  unknown 
density  function  fi  with  the  normal  derivative  of  the  Green's  function  in  the 
plane: 

u{z)   =    —  / n{s)ds,       zeD,  (1) 

2.x  Jap         on, 

where      r(s,z)   =    \s  —  z\. 
If  the  region  is  simply  connected,  the  density  function  fi  satisfies 

„(„  +  i  /   2*gk2#<.>i.  =  MO- 

tf  JdD        on, 
If  the  region  is  doubly  connected,  with  boundary  curves  Lq  and  L\,  and 
u{z)  is  the  real  part  of  a  single  valued  analytic  function,  then  the  density 
function  satisfies  the  following  equation  [13]: 

+    1    /      01ogr(s,O+fl(3  tt))fl(s)ds    =    2g(t),  (2) 

7r  JaD         on, 


.1     if  5  and  t  both  he  on  L\ 
where      a(s,t)    =    <    .         . 

1    0    otherwise 

These  are  Fredholm  integral  equations  of  the  second  kind  and,  if  the  bound- 
ary 3D  is  smooth  enough,  they  can  be  solved  very  accurately  numerically. 
Outside  D,  formula  (1)  defines  another  harmonic  function  u(z): 

.,   x  1     [     dlogr(s.z)    ,  .  , 

u<2     =    ^  /     fcLj-J^(5  d5-  3 

27T  Jqd         on, 

The  function  u  is,  of  course,  a  discontinuous  extension  of  the  function  u. 
However,  all  the  discontinuities  between  u  and  u  can  be  expressed  in  terms 
of  the  density  /z  and  the  boundary  curve.  In  Section  3  we  show  how  to 
determine  the  discontinuities  and  how  to  use  them  to  rapidly  compute  an 
approximation  to  the  combined  function  U(z),  which  is  equal  to  u(z)  inside 
D,  and  u(z)  outside  D. 

We  can  also  use  formula  (1)  to  express  the  conjugate  function  v(z)  to 
u{z)  in  terms  of  the  density  fi.  Since  the  conjugate  harmonic  function  of 
— I  »     ;  is  — <t '     ' ,  we  have 

on,  as  ' 

1     /"     dloer(s.z)    ... 

v&  =  ^z  /  — ar-LtiWds-  (4 

27T  JdD  OS 

2.2     Interface  Problems 

A  similar  formulation  can  be  used  to  solve  certain  interface  problems.  In 
linear  magnetostatics,  for  example,  the  following  problem  arises.  One  seeks 
to  find  a  function  u(z)  defined  on  a  magnetizable  region  D  and  on  the  infinite 
exterior  region  outside  D  such  that 

V-aVu(2)   =   4>{z), 

where  <f>(z)  is  the  given  potential  function  for  the  applied  field,  a  is  the 
magnetic  permeability  of  the  region,  and  u(z)  is  the  unknown  potential 
function  for  the  H  field  (i.e.,  the  function  satisfying  yu  =  —H ).  In  typical 
linear  problems,  a  is  equal  to  a  large  constant  a.\  inside  the  magnetizable 
region  D,  and  equal  to  a  small  constant  ao  in  the  exterior  region.  Continuity 
of  the  H  field  and  the  normal  component  of  the  B  field  imply  that  u(z)  and 
aun(z)  are  continuous  across  the  boundary  of  D. 

It  turns  out  that  both  inside  and  outside  D,  the  potential  u  can  be 
expressed  as  the  sum  of  a  term  involving  the  applied  potential  and  the 


integral  of  a  double  layer  density  function,  where  the  density  is  the  value  of 
the  potential  on  the  boundary: 

o+l  /     dG(s,z)  <f>(z) 

u(z)   =    /      - -u(s)ds  +    — —,       z(D  (5) 

a     Jdo       on  a 

u(z)    =    (fi  +  1)/     dGlS,Z)u(s)ds  +   <t>(z),       zeDc  (6) 

JdD      on 

where  a  =  **■  is  the  relative  premeability.  See  [9].  On  the  boundary  of  the 
region,  the  potential  u(t)  satisfies 

_  fi-1  l    dGj±£  = 

a  +  1  JdD      on 
In  two  dimensions,  the  kernel  (Green's  function)  is  j^  log  r(s,  z),  while  in 
three  dimensions  it  is  4yrK  x\  ■     We  have  solved  this  integral  equation  in 
three  dimensions,  in  which  case  the  kernel,  while  in  L2(dD),  is  unbounded. 

2.3      The  Biharmonic  Equation 

The  integral  equation  formulation  used  for  the  biharmonic  equation  relies  on 
the  representation  of  a  biharmonic  function  in  terms  of  Goursat  functions. 
Any  two  dimensional  biharmonic  function  W(z)  can  be  expressed  as 

W{z)   =   Re{z4>{z)  +  X(z)),  (8) 

where  4>{z)  and  x(-z)  are  analytic  functions.  The  functions  4>{z)  and  ip(z)  = 
x'(-z)  are  known  as  the  Goursat  functions.  For  a  given  biharmonic  function 
W(z),  the  Goursat  functions  are  not  uniquely  determined.  More  precisely, 
4>'(z)  is  determined  to  within  a  purely  imaginary  additive  constant,  and  so 
4>{z)  is  determined  to  within  an  additive  term  of  the  form 

iaz  +   0, 

where  a  is  real  and  /3  is  complex.  Similarly,  ip(z)  is  not  uniquely  determined. 
However,  all  the  physically  meaningful  quantities,  such  as  velocities  in  fluid 
mechanics  or  stresses  and  displacements  in  elasticity  can  be  expressed  in 
terms  of  certain  derivatives  of  the  Goursat  functions,  which  are  uniquely 
determined.  Therefore,  it  suffices  to  compute  these  two  functions,  and  not 
the  biharmonic  function  itself.  Both  of  these  functions  can  be  expressed  as 
Cauchy  integrals: 


4>(z)   =    —  /     — —ds  ,        rp(z)   =   —  /     -^ K-^-ds  .       (9) 

v   ;         2tti  JdDs-  z  rv  2TriJdD         s-z  K  ' 

The  integral  equation  one  uses  to  determine  the  density  function  ui(t) 
depends  on  the  boundary  conditions.  We  have  considered  the  case  in  which 
Wx  and  Wy  (or,  equivalently,  W  and  Wn)  are  prescribed.  In  previous  work 
[10],  the  following  integral  equation,  originally  developed  by  Lauricella  and 
Sherman  [17],  was  used: 

u(t)  +  -  [    u{s)dd  -  -  I    Q(s)e2ied6  +  —.Re(  f    -^^ds)  =  f(t), 

7T  JdD  71"  JdD  *l  JdD  \S  -  Cy 

(10) 
where 

f(t)   =   Wx(t)  +   iWy(t), 

»,     ,  ,y(s)-y(t).      ,         i  i  t-c 

6(s,t)   =   arctan(^ — ^),       b   = -^ — -7   +   - 


i(s)-x(0  t-c         (i-c)2         (<-c)2' 

and  c  is  any  point  inside  the  region  D.  It  is  important  to  note  that  the 
kernel  of  this  equation  is  always  bounded. 

It  is  known  [13]  that  if  a  function  u(t)  satisfies  this  equation,  then  the 
third  integral  in  the  equation  is  zero,  provided  f(t)  satisfies  the  natural 
compatibility  condition  Re(fdD  f(t)dt)  =  0.  Thus  equation  (10)  can  be 
replaced  by 

u(t)  +    -  /    u(s)d6  -    -  f    u(s)e2l6d0   =    /(«).  (11) 

fl"  JdD  X  JdD 

We  chose  to  solve  this  integral  equation  instead  of  (10),  despite  the  fact 

that  this  equation  need  not  have  a  unique  solution.  This  is  acceptable  for 

the  following  reason.  Any  solution  oJo(t)  of  the  homogeneous  equation 

wo(0  +  -  /    uq(s)<L9 /    Qo{a)e2i6dB  =  0 

7T  JdD  T  JdD 

corresponds  to  the  Goursat  functions  4>o{z)  =  iaz  and  il>o(z)  =  0,  where  a 
is  real  [13].  Since  4>{z)  is  only  determined  up  to  terms  of  this  form,  and 
since  the  physically  meaningful  quantities  are  not  changed  by  adding  terms 
of  this  type  to  4>(z),  any  solution  of  the  modified  equation  will  provide  a 
physically  correct  solution.  Existence  of  multiple  solutions  is  not  a  problem 


for  the  iterative  methods  we  used,  as  will  be  explained  in  Section  4.  Fur- 
thermore, it  is  easy  to  show  that  whenever  the  kernel  |^  is  symmetric,  the 
kernel  of  (11)  will  be  symmetric.  This  is  true,  for  example,  when  the  region 
is  an  ellipse.  A  symmetric  kernel  is  an  advantage,  since  in  this  case  the 
minimum  residual  variant  of  the  conjugate  gradient  algorithm  can  be  used 
to  solve  the  problem  (and,  in  fact,  the  biconjugate  gradient  method  and  the 
(unrestarted)  GMRES  method  reduce  to  the  minimum  residual  conjugate 
gradient  algorithm).  Conjugacy  of  direction  vectors  is  maintained  without 
having  to  save  vectors  and  explicitly  orthogonalize.  In  addition,  even  when 
the  kernel  is  not  symmetric,  we  found  that  the  kernel  of  (11)  is  closer  to 
being  normal  than  that  of  (10).  Experimental  evidence  indicated  that  con- 
vergence was  always  more  rapid  with  equation  (11)  than  with  equation  (10), 
so  it  was  decided  to  use  this  modified  equation. 

3      Solving  the  Differential  Equations 

After  solving  the  integral  equations  of  Section  2  for  the  density  function  /i, 
we  are  then  able  to  compute  the  solution  u(z),  at  points  z  within  the  region 
D,  in  the  following  manner.  Discontinuous  extensions  of  the  solutions  to  the 
differential  equations  are  defined  throughout  a  larger  embedding  rectangu- 
lar region.  Specifically,  equation  (1)  defines  the  extension  of  the  solution  of 
Laplace's  equation,  and  equation  (9)  defines  the  extensions  of  the  real  and 
imaginary  parts  of  the  Goursat  functions  for  the  biharmonic  equation.  These 
functions  are  all  harmonic,  except  at  the  boundary  of  the  original  region. 
Therefore,  their  discrete  Laplacians  are  all  zero,  up  to  truncation  error,  ex- 
cept at  mesh  points  near  the  boundary.  We  compute  approximations  to  the 
discrete  Laplacians  at  these  points  near  the  boundary  and  approximations 
to  the  extended  function  at  the  edge  of  the  embedding  region.  Then  we 
apply  fast  Poisson  solvers  to  obtain  approximate  solutions  to  the  differential 
equations,  throughout  the  domain  D. 

3.1      Solution  of  Laplace's  Equation 

The  integral  equation  (1)  can  be  used  with  the  second-order  accurate  five- 
point  discrete  operator  A5h  to  compute  a  second  order  accurate  solution  to 
Au  =  0  on  an  irregular  region  D  with  smooth  boundary  3D  =  (x(s),y(s)) 
on  which  smooth  Dirichlet  boundary  data  u(s)  =  g(s)  is  prescribed.  The 
procedure  is  as  follows. 


First  embed  D  in  some  regular  region  R,  such  as  a  square  with  a  uniform 
mesh  of  width  h  in  the  z  and  y  directions.  Define  a  discontinuous  extension  u 
of  u  throughout  the  region  R,  using  formula  (3).  As  noted  previously,  all  the 
jump  discontinuities  in  the  derivatives  of  u  and  u  can  be  expressed  in  terms 
of  the  density  function  and  the  derivatives  of  the  boundary  curve.  Once 
the  discontinuities  in  the  derivatives  of  the  combined  function  are  known 
and  the  distances  from  the  boundary  curve  to  the  neighboring  mesh  points 
are  computed,  it  is  easy  to  compute  the  discrete  Laplacian  of  the  combined 
function.  Knowing  an  approximation  to  the  discrete  Laplacian  throughout 
the  regular  region  R,  a  fast  Poisson  solver  can  then  be  used  to  obtain  an 
approximation  to  the  solution  u. 

Define  U(z)  to  be  the  combined  function, 

U(z)  =   {  U{z)    Z  f  ° 

The  discontinuity  between  u  and  u  at  a  point  on  the  boundary  of  D  is  equal 
to  the  value  of  the  density  at  that  point  [13].  Therefore,  the  discontinuity 
between  their  tangential  derivatives  is  equal  to  the  derivative  of  the  density, 

There  is  no  discontinuity  between  their  normal  derivatives: 


Using  these  two  facts  and  the  direction  of  the  curve,  we  can  compute  the 
discontinuities  between  ux  and  ux  and  between  uy  and  uy: 

,  .       .    .  ,  n'(s)x'{s)  .  .       ...  »'(s)y'(s) 

U^S)  ~  u*(s)  =  xw  +  tisr  Uy{s)  ~ Uy{s)  =  *w  +  »w  • (  } 

Discontinuities  in  higher  order  derivatives  of  U  can  be  obtained  by  differen- 
tiating these  expressions. 

In  two  dimensions  it  is  also  possible  to  compute  these  discontinuities  by 
using  the  fact  that  U(z)  is  the  real  part  of  a  Cauchy  integral  with  density 
function  /i.  Let 


ft  \  l     f     ^>jr 


The  kernel  in  (1)  is  the  real  part  of  the  Cauchy  kernel: 

V27rt  C-  ^  2tt       (x(s)  -  x(t))2  +  (y(s)  -  y(t))2  K     ' 

1  d  log  r(s,z) 

■as, 


2n        dn„ 
where     ((s)  =  1(5)  +  iy(s),      z(t)  =  x(t)  +  iy(t). 

Therefore  u(z)  =  Re(f(z))  for  z  in  D,  and  u(z)  =  Re(f(z))  for  z  outside  £>. 
To  find  the  discontinuities  between  u  and  u,  we  use  the  known  discontinuities 
of  Cauchy  integrals  across  the  boundary  curve  and  the  fact  that  Cauchy 
integrals  are  analytic  functions.  For  details  see  [10]. 

These  discontinuities  are  then  used  to  compute  approximations  to  the 
discrete  Laplacian  at  mesh  points  near  the  boundary  of  D.  Define  the  mesh 
function  £/,j  by 

v.      _    /  «(*«» yj)    (xi>Vj)eI> 
,J   "   \  u(xi,yj)    (xi,yj)eDc 

An  approximation  to  Ajjf/.j  at  all  mesh  points  of  R  can  be  computed  as 
follows.  At  mesh  points  (i,  j)  of  R  which  have  all  four  of  their  neighboring 
mesh  points  (i  +  1,  j),  (i  -  l,j),  (i,j  +  1),  and  (i,j  —  1)  on  the  same  side 
of  dD,  set  Ajjt/ij  =  0,  since  u  and  u  are  harmonic.  Consider  the  set  B+  of 
mesh  points  which  have  neighboring  mesh  points  on  both  sides  of  dD.  This 
set  consists  of  two  rings  of  mesh  points,  one  inside  and  one  outside  D.  Let 
p,  for  example,  be  a  mesh  point  which  is  in  D  but  whose  neighbor  to  the 
right,  pe,  is  not.  Let  p*  be  the  point  on  dD  on  the  line  between  p  and  ps, 
let  h\  be  the  distance  between  p  and  p*,  and  let  /*2  =  h  —  h\. 

By  manipulating  the  Taylor  series  at  p  and  pe,  both  evaluated  at  p*,  one 
can  derive  the  following  expression  for  u(pe)  —  u(p).  (For  details  see  [10].) 

u(PE)-u(p)    =     [u(p')  -  u(p'))  +   h2[ux{p')  -  ux{p')]  +  (14) 

-h\[uxx{p')-  uxx(pm)]   +   -h\[uxxx(p*)  -  uxxx(p')}   + 

2  D 

—hA2[uxxxx{p')  -  uxxxx{p')}   +   —  h\[uxxxxx(p')  -  uxxxxx(p')}  + 

hux{p)  +   -h2uxx(p)  +   -h3uxxx(p)  +   —  h4uxxxx(p)  + 
l  0  z4 

y^5uxx**x(p)  +  0(/i6). 
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Note  that  the  first  six  terms  depend  on  the  discontinuities  between  u  and  u 
and  their  derivatives  at  the  boundary.  The  other  terms  are  the  usual  Taylor 
series  terms.  Therefore,  once  these  discontinuities  are  known,  the  right  hand 
side  of  (14)  is  just  the  sum  of  known  quantities  and  Taylor  series  terms. 
Now  let  pw  be  the  point  to  the  left  of  p.  Then  we  have 

U(pw)  —  U(p)     =     {known  quantities}   —   hux(p)  +   -h2uxx(p)   —    —h  uXII{p) 

2  6 

+  -^;h4uxxxx(p)   -   —h5uxxxxx(p)  +   0(h6), 
where  the  quantities  in  braces  are  zero  if  pw  is  in  D.  In  any  case,  we  have 


U{pw)  +  U(pe)  —  2f/(p)     =     {known  quantities}   +   h2uxx(p)    (15) 

+  j^h4uxxxx(p)  +  0(he). 

If  pw  is  the  point  above  p  and  ps  is  the  point  below  p,  then  by  the  same 
arguments  we  have 


U(pn)  +  U(ps)  —  2(/(p)     =     {known  quantities}   +   h2uxx(p)    (16) 
+  -^h4uxxxx(p)  +   0(h6). 

Let  gf.  denote  the  sum  of  the  quantities  in  braces  in  (15)  and  (16). 
Adding  (15)  and  (16)  and  using  the  fact  that  uxx(p)  +  uyy(p)  =  0,  we  obtain 

h2A5hU(p)   =   <?+    +  ^[uXIXX(p)  +  uyyyy(p))  +  0(h6).  (17) 

The  same  equation  holds  with  u  replaced  by  u  if  p  is  a  point  outside  D  since 
u  is  then  harmonic  at  p.  (It  is  important  to  notice  that  it  is  not  assumed 
that  either  of  the  harmonic  functions  u  or  u  can  be  extended  beyond  dD. 
If,  however,  they  could  both  be  extended  one  mesh  width,  then  the  formulas 
we  would  obtain  for  the  discrete  Laplacian  clearly  agree  with  those  we  have 
derived.) 

Therefore,  if  the  integral  equation  can  be  solved  accurately  enough,  one 
can  obtain  a  fourth  order  accurate  approximation  to  h2A^U(p)  at  points  of 
B+ .  This  guarantees  the  accuracy  of  the  solution  obtained  after  applying  a 
fast  solver.  Define  t//1  to  be  the  solution  of  the  following  set  of  equations 
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(A^X  =  0    (Xi,yj)€R-B+ 
(AlUfrij^g+j        (xl,y])eB+  (18) 

(U?)ij  =  Uij        (xuyj)€dR 

where  the  boundary  values  Uij  on  dR  are  computed  by  evaluating  the  in- 
tegral (3)  using  the  computed  density  fi.  In  practice,  we  have  used  the 
trapezoid  rule  as  the  quadrature  formula  for  2-D  problems. 

If  the  values  of  gf.  and  Uij  are  sufficiently  accurate,  then  U^  will  be  a 
second  order  accurate  approximation  to  U.  For  a  proof  see  [10].  By  using 
a  higher  order  accurate  approximation  to  the  Laplacian  and  retaining  more 
terms  in  the  Taylor  series,  a  higher  order  accurate  solution  can  be  obtained. 

3.1.1      Computation  of  the  Conjugate  Function 

The  conjugate  of  a  harmonic  function  can  also  be  computed  at  small  addi- 
tional cost.  Using  the  Cauchy  Riemann  equations,  the  discontinuities  in  the 
conjugate  function  v  can  be  expressed  in  terms  of  the  discontinuities  in  u. 
For  example, 

Vy      -      Vy       =      -(Uy    -   U„)      = 


x'(sy  +  y'(sy 

These  discontinuities  could  also  be  computed  using  the  fact  that  v  is  the 
imaginary  part  of  the  same  Cauchy  integral  that  u  determines: 

n         r    ,  1     /     MO  .  v  1     /     d\ogr(s,z) 

v(z)    =   Im(—  / as)   =    —  / u{s)ds. 

y  '  y2i:JdD(,-z     '         2ir  JdD         ds        ^v  ; 

Knowing  these  discontinuities,  we  can,  in  the  same  way  as  before,  compute 
an  approximation  to  the  discrete  Laplacian  of  v,  and  then  apply  a  fast 
Poisson  solver  to  obtain  an  approximation  to  v. 

3.1.2      Computation  of  Derivatives 

An  important  property  of  these  methods  is  that  one  can  easily  compute 
the  derivative  of  a  harmonic  function  directly,  without  having  to  compute 
the  function  itself.  This  follows  because  all  the  derivatives  of  a  harmonic 
function  are  themselves  harmonic  and  because  one  can  compute  the  discon- 
tinuities in  all  these  derivatives. 
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3.2  Solution  of  Interface  Problems 

Since  the  integrals  in  (6)  and  (7)  are  also  integrals  of  double  layer  density 
functions,  the  same  method  can  be  used  to  solve  interface  problems. 

3.3  Solution  of  the  Biharmonic  Equation 

As  noted  in  Section  2,  the  solution  of  the  biharmonic  equation  can  be  reduced 
to  the  evaluation  of  two  analytic  functions  which  satisfy  (9)  and  the  pre- 
scribed boundary  conditions.  More  precisely,  the  evaluation  of  the  physically 
meaningful  quantities  reduces  to  the  evaluation  of  the  Goursat  functions  (8). 
Both  of  the  Goursat  functions  can  be  expressed  as  Cauchy  integrals 
whose  densities  are  given  in  terms  of  the  solution  of  the  integral  equa- 
tion (11).  Therefore,  it  suffices  to  be  able  to  evaluate  a  Cauchy  inte- 
gral g(z)  =  f9D  j^fdt  with  prescribed  density  function  f(t).  Suppose 
f(t)  =  p(x,y)  +  iq(x,y).  The  integral  g{z)  can  be  expressed  in  terms  of 
certain  integrals  of  double  layer  density  functions  and  their  conjugates: 

,  N  1      /"  dlogr       .dlogr.  1     /  dlogr       .dlogr. 

Lkx  Jqe)  on,  os  2tt  Jqd  on,  os 

Therefore,  the  same  methods  can  be  used  to  evaluate  the  Goursat  functions. 

4      Numerical  Experiments 

In  this  section  we  report  on  calculations  we  have  performed.  Parallel  im- 
plementation of  the  method  is  discussed,  and  details  are  given  on  the  per- 
formance of  the  iterative  methods  used  for  solving  the  integral  equations. 
Accuracy  of  the  method  was  discussed  in  detail  in  [10],  and  will  not  be 
repeated  here. 

The  complete  solution  method  consists  of  the  following  steps: 

1.  The  irregular  region  D  is  embedded  in  a  rectangle  R  with  a  uniform 
mesh,  and  distances  from  neighboring  mesh  points  to  the  boundary 
are  computed. 

2.  The  boundary(ies)  of  the  region  D  are  discretized,  and  the  integral 
operators  are  replaced  by  quadrature  formulae.  Since  the  kernels  are 
bounded  for  the  2-D  problems,  we  used  the  trapezoid  rule,  since  it  is 
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extremely  accurate  on  periodic  regions.  For  the  3-D  interface  prob- 
lem, a  different  procedure  was  used  since  the  kernel  is  unbounded.  We 
triangulated  the  surface  of  the  region,  which  was  flat.  We  assumed  a 
continuous,  piecewise  linear  density  function  and  integrated  the  prod- 
uct of  the  kernel  function  with  the  piecewise  linear  basis  functions 
exactly.  Explicit  quadrature  formulae  can  be  found  in  [9].  (These 
formulae  require  the  surface  to  be  flat.) 

3.  Having  formed  the  dense,  nonsymmetric  matrix  arising  from  the  in- 
tegral equation,  the  appropriate  linear  system  is  then  solved  for  the 
density  //  using  an  iterative  method.  Methods  that  we  experimented 
with  include  the  biconjugate  gradient  /  conjugate  gradient  squared 
method  [5,18],  GMRES  [16],  and  the  conjugate  gradient  method  ap- 
plied to  the  normal  equations  [6].  In  each  case,  the  diagonal  of  the 
matrix  was  used  as  a  preconditioner. 

4.  Approximations  to  the  discrete  Laplacian  are  computed  near  the  bound- 
ary dD,  and  approximations  to  the  extended  function  are  computed 
along  the  boundaries  of  R. 

5.  A  fast  Poisson  solver  is  applied  to  obtain  an  approximate  solution  to 
the  differential  equation. 

4.1      Test  Problems  and  Performance  of  Iterative  Methods 

Although  the  matrices  that  arise  from  the  integral  equations  of  Section  2  are 
dense  and  nonsymmetric,  they  are  also  well-conditioned  and,  in  most  cases, 
have  tightly  clustered  eigenvalues  and  singular  values.  For  this  reason,  all  of 
the  nonsymmetric  conjugate  gradient-type  iterative  methods  that  we  have 
tried  have  performed  well.  In  each  case,  the  diagonal  of  the  matrix  was  used 
as  the  preconditioner.  Results  are  presented  primarily  for  the  biconjugate 
gradient  (BCG)  method,  but  some  of  the  problems  were  also  solved  using 
GMRES  [16],  conjugate  gradients  on  the  normal  equations  (CGNR)  [6],  and 
the  conjugate  gradient  squared  (CGS)  method  [18]. 

The  effectiveness  of  these  iterative  methods  for  solving  the  double  layer 
integral  equation  for  Laplace's  equation  on  simply  connected  regions  is  well- 
documented.  (See,  for  example,  [11].)  We  have  performed  calculations  on 
doubly  connected  regions  and  have  found  the  results  to  be  similar.  In  partic- 
ular, we  have  tested  the  biconjugate  gradient  method  on  a  region  bounded 
by  two  nonconcentric  ellipses.  The  outer  ellipse  had  semiaxes  .35  and  .40, 
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n 

number  of  BCG 

Iterations 

50 

6 

100 

6 

200 

6 

360 

9 

Table  1:  Doubly  Connected  Region  1. 


n 

number  of  BCG 

Iterations 

50 

9 

100 

9 

200 

9 

360 

12 

Table  2:  Doubly  Connected  Region  2. 


the  inner  ellipse  had  semiaxes  .08  and  .10,  and  the  center  of  the  inner  el- 
lipse was  offset  .14  from  the  center  of  the  outer  ellipse  along  the  major  axis. 
Starting  with  zero  as  an  initial  guess,  we  iterated  until  the  relative  size  of 
the  pseudo-residual  was  reduced  below  10-6: 


WM-^w 


<    10" 


IIM-VII 

Here  M  is  the  diagonal  of  the  matrix,  rk  is  the  residual  at  step  k,  and  /  is 
the  right-hand  side  of  the  matrix  equation.  In  all  cases,  the  true  residual, 
/  —  Axk  was  computed,  since,  for  some  problems  this  may  differ  from  the 
vector  rk  generated  by  the  biconjugate  gradient  algorithm. 

Table  1  shows  the  number  of  iterations  required  to  satisfy  this  conver- 
gence criterion,  for  different  values  of  n,  the  number  of  boundary  discretiza- 
tion points.  Table  2  shows  results  for  a  slightly  different  region.  Now  the 
outer  ellipse  has  semiaxes  .35  and  .15. 

The  biconjugate  gradient  method  was  also  used  to  solve  the  three  di- 
mensional integral  equation  (7)  for  an  interface  problem  in  magnetics.  The 
region  D,  shown  in  Figure  1,  is  shaped  like  a  U  shaped  recording  head, 
with  equal  pole  tips  of  width  5  meters,  gap  1  meter,  height  30  meters,  and 
uniform  depth  10  meters.  The  function  4>{z)  was  the  applied  field  due  to 
an  infinitely  long  straight  wire  with  100  amps  of  current,  and  the  relative 
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Figure  1:  U  Shaped  Recording  Head 

permeability  was  set  to  1000.  In  one  case  we  used  361  boundary  elements 
and  in  another  case  we  used  1086  elements.  In  both  cases  our  initial  guess 
for  the  potential  at  a  point  on  the  boundary  was  the  value  of  the  applied 
field  at  that  point.  The  biconjugate  gradient  method  required  28  iterations 
to  achieve  convergence  for  the  smaller  problem,  31  iterations  for  the  larger 
problem. 

What  is  perhaps  more  interesting  is  that  good  results  were  also  obtained 
using  iterative  methods  to  solve  the  integral  equation  for  the  biharmonic 
equation.  Table  3  shows  the  number  of  iterations  required  by  various  itera- 
tive methods  to  satisfy  the  convergence  criterion  on  several  different  regions. 
We  always  placed  256  points  on  the  boundary  of  the  region,  so  the  matrix 
was  of  order  512.  The  crescents  mentioned  in  the  table  are  noncon%'ex  and 
are  pictured  in  Figure  2  and  Figure  3.  They  are  given  parametrically  by 

x(t)   =   cost  +  .15sin2i,       y(t)   =   .5sini  +  d cos2  t. 

In  crescent  1,  d  =  .4,  and  in  crescent  2,  d  =  .7. 

Most  of  the  numbers  in  the  table  were  obtained  by  taking  the  right  hand 
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Region 

BCG 

CGS 

GMRES(5) 

GMRES(40) 

CGNR 

1.  circle  of  radius  1 

4 

3 

4 

4 

4 

2.  ellipse  with  semiaxes  1  and  2 

9 

6 

10 

9 

12 

3.  ellipse  with  semiaxes  1  and  10 

32  (20) 

27  (18) 

73  (58) 

32  (20) 

74  (41) 

4.  ellipse  with  semiaxes  1  and  20 

58  (22) 

19  (10) 

44  (23) 

32  (14) 

145  (21 

5.  crescent  1 

11 

8 

12 

11 

16 

6.  crescent  2 

16 

11 

23 

15 

26 

Table  3:  Biharmonic  Equation  (11). 

side  of  the  equation  to  correspond  to  the  biharmonic  function  W(x,y)  = 
x2  +  y2  +  x,  and  using  a  random  initial  guess.  In  practice,  however,  it 
is  usually  advantageous  to  use  a  smooth  initial  guess,  such  as  zero  or  the 
right  hand  side,  since  the  true  solution  will  be  similarly  smooth.  To  de- 
termine whether  a  reasonably  chosen  initial  guess  could  significantly  reduce 
the  number  of  iterations,  we  solved  the  biharmonic  problem  for  several  more 
complicated  right  hand  sides,  using  a  zero  initial  guess.  The  results  did  not 
differ  significantly  from  those  obtained  previously,  except  on  the  highly  ec- 
centric ellipses.  The  numbers  in  parentheses  in  Table  3  are  iteration  counts 
using  an  initial  guess  of  zero,  when  the  biharmonic  function  W(z)  is  given 

by 

W(z)    =    Re(z<t>(z)  +   x(*)),       where 


4>{z)    =    £(-1)' 


k+  1 


+  e',     X(z)  =  ElTT-2^ 


k  +  1 


We  believe  that  these  convergence  rates  may  be  more  realistic  than  those 
obtained  with  a  random  initial  guess,  for  most  applications. 

Note  that  the  most  difficult  problems  for  the  iterative  methods  (with 
either  initial  guess)  were  those  defined  on  highly  eccentric  ellipses.  For  prob- 
lems 3  and  4,  the  GMRES  method  converged  in  32  iterations  (for  the  random 
initial  guess),  provided  all  direction  vectors  were  saved  (GMRES(40)),  but 
required  significantly  more  iterations  when  only  5  direction  vectors  were 
saved  (GMRES(5)).  Similarly,  the  other  iterative  methods  were  slower  to 
converge  on  these  regions. 

It  should  be  noted  that  some  of  these  iterative  methods  require  more 
work  per  iteration  than  others.  Since  the  bulk  of  the  work  at  each  iteration 
is  in  applying  the  dense  matrix  to  a  vector,  the  BCG,  CGS,  and  CGNR 
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methods  require  almost  twice  as  much  work  per  iteration  as  the  GMRES 
method,  even  when  40  direction  vectors  are  saved  and  orthogonalized  against 
at  each  step.  (This  is  because  the  matrix  is  dense  and  so,  a  standard  matrix 
vector  multiply  requires  n2  operations,  compared  to  about  40n  for  orthogo- 
nalizing.  In  future  work,  it  is  planned  to  replace  this  standard  matrix  vector 
multiply  routine  by  an  0(n)  method  [3].  Still,  the  matrix  vector  multiply 
will  require  significantly  more  than  40n  operations,  so  the  comparison  of 
operation  counts  per  iteration  for  the  various  iterative  methods  probably 
will  not  change.)  Thus,  to  compare  the  methods  according  to  total  work 
performed,  the  iteration  counts  for  BCG,  CGS,  and  CGNR  should  be  mul- 
tiplied by  two  and  compared  with  that  of  GMRES(5)  and  GMRES(40).  It 
can  then  be  seen  that  the  GMRES(40)  method  is  the  most  efficient,  with 
CGS  or  GMRES(5)  second.  In  addition,  it  may  be  necessary  to  compute  the 
residual  directly  (as  we  have  done)  at  each  step  of  the  CGS  algorithm,  since 
the  vector  rk  computed  by  updating  may  not  resemble  the  true  residual.  If 
this  is  done,  then  the  number  of  matrix  vector  multiplies  per  CGS  iteration 
is  3  instead  of  2,  and  the  GMRES  methods  look  even  better  in  comparison. 
Of  course,  GMRES(40)  requires  almost  40  vectors  more  in  storage  than 
these  other  methods,  and  it  is  difficult  to  predict  just  how  many  vectors 
will  be  needed  in  order  to  obtain  rapid  convergence  with  GMRES.  The  least 
efficient  iterative  method  for  these  problems  is  CGNR. 

To  understand  why  the  iteration  counts  are  as  described,  we  computed 
the  eigenvalues  and  singular  values  for  these  problems.  The  512  singular  val- 
ues for  each  region  are  plotted  in  decreasing  order  of  magnitude  in  Figure  4. 
The  eigenvalues  were  very  close  to  the  singular  values  (almost  indistinguish- 
able on  graphs  such  as  Figure  4)  and  had  tiny  imaginary  parts  and  non- 
negative  real  parts.  This  tends  to  be  an  advantage  for  the  GMRES  method 
over,  say,  CGNR,  because  the  convergence  rate  of  GMRES  is  governed  by 
the  eigenvalues  while  that  of  CGNR  is  determined  by  the  squared  singular 
values.  We  note  that  in  all  cases,  the  eigenvalues  and  singular  values  cluster 
around  1.  For  the  eccentric  ellipses,  more  eigenvalues  and  singular  values 
appear  near  the  ends  of  the  spectrum,  causing  some  difficulty  for  the  itera- 
tive methods.  The  ratio  of  the  largest  to  the  second  smallest  singular  value 
for  each  of  the  six  regions  is  2.0,  5.2,  265,  2031,  8.5,  and  22.3,  respectively. 

The  presence  of  a  zero  singular  value  does  not  adversely  affect  the  con- 
vergence rate  of  the  iterative  methods.  The  reason  for  this  is  as  follows.  For 
each  of  these  iterative  methods,  the  residual  rk  satisfies 

r*   =   Pk(B)r°, 
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where  Pk  is  a  kth  or  2kth  (for  CGS)  degree  polynomial  with  value  one  at 
the  origin  and  B  =  A  for  the  GMRES,  BCG,  and  CGS  methods  (actually, 
M~lA,  where  M  is  the  diagonal  of  A),  B  =  AAT  for  CG  on  the  normal 
equations.  Assuming  B  has  a  complete  set  of  eigenvectors  V  (which  it  does 
for  each  of  these  problems),  we  can  write  B  =  VAV-1  and 

K-V   =   Pk(A)V-lr°.  (19) 

Now  suppose  some  eigenvalue  A,  is  zero.  Since  the  equations  are  consistent, 
the  ith  component  of  V~l  times  the  right  hand  side  /  of  the  equation  Bx  =  f 
must  also  be  zero;  for  we  have 

X,(V-lx),  =  (K-1/),. 

It  also  follows  that  the  iih  component  of  V-1  times  the  initial  residual  r°  = 
/  -  Bx°  is  zero: 

(I/-V),  =  (F-1/).-Ai(y-1A-. 

It  then  follows  from  (19)  that  the  ith  component  of  V~lrk  is  zero  for  all  steps 
k,  and  it  is  easy  to  check  that  the  coefficients  computed  by  each  of  these 
iterative  methods  are  the  same  as  those  that  would  be  computed  if  this  zero 
eigenvalue  were  not  present.  Thus,  these  methods  converge  to  a  solution  of 
the  singular  linear  system  at  the  same  rate  as  they  would  converge  to  the 
solution  of  a  nonsingular  linear  system  having  only  the  nonzero  eigenvalues. 
Table  4  shows  the  number  of  iterations  needed  to  satisfy  the  same  conver- 
gence criterion,  when  we  discretized  equation  (10)  instead  of  (11).  Note  that 
the  number  of  iterations  required  was  significantly  greater  for  each  of  the 
iterative  methods.  The  GMRES(5)  method  failed  on  the  crescent  problems, 
stagnating  (ceasing  to  reduce  the  error)  well  before  it  reached  the  desired 
level  of  accuracy. 

4.2      Parallel  and  Vector  Implementation  of  the  Method 

The  major  part  of  the  work  at  each  iteration  of  the  biconjugate  gradient 
algorithm  is  in  multiplying  the  matrix  and  its  transpose  by  an  arbitrary 
vector.  This  operation  is  easily  vectorized  and/or  parallelized,  as  are  the 
other  operations  (dot  products,  saxpy's,  etc.)  required  for  an  iteration.  On 
a  machine  with  both  vector  and  parallel  capabilities,  different  blocks  of  the 
matrix  can  be  assigned  to  different  processors  and  vectorization  can  be  per- 
formed over  the  rows  within  each  block.  This  can  actually  be  accomplished 
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Region 

BCG 

CGS 

GMRES(5) 

GMRES(40) 

CGNR 

1.  circle  of  radius  1 

8 

7 

10 

8 

11 

2.  ellipse  with  semiaxes  1  and  2 

12 

9 

15 

12 

17 

3.  ellipse  with  semiaxes  1  and  10 

45 

35 

76 

34 

92 

4.  ellipse  with  semiaxes  1  and  20 

86 

41 

119 

50 

188 

5.  crescent  1 

30 

30 

>  200 

24 

45 

6.  crescent  2 

50 

64 

>  200 

30 

82 

Table  4:  Biharmonic  Equation  (10)  (Random  Initial  Guess). 

using  a  level-2  Bias  routine  [4],  and  these  routines  have  already  been  opti- 
mized on  many  parallel/vector  supercomputers.  Another  significant  part  of 
the  work  is  in  applying  the  fast  Poisson  solver  after  the  integral  equations 
have  been  solved.  This  is  also  easily  parallelized,  with  different  processors 
computing  FFT's  simultaneously,  while  the  individual  FFT's  may  be  vec- 
torized. 

We  implemented  the  entire  algorithm  for  computing  a  harmonic  function, 
its  conjugate,  and  its  first  derivatives  on  an  8-processor  shared  memory 
MIMD  machine  -  the  NYU  Ultracomputer  prototype  -  and  obtained  good 
speedup  over  the  serial  code.  The  region  was  the  first  doubly  connected 
region  bounded  by  two  ellipses  described  above.  We  again  placed  200  points 
on  the  boundary  and  used  a  33  by  33  rectangular  mesh  on  the  rectangle  in 
which  the  region  was  embedded. 

In  this  case  the  biconjugate  gradient  routine  accounted  for  about  40% 
of  the  time  required  by  the  serial  code.  To  obtain  good  speedup  of  the 
entire  code,  it  was  also  necessary  to  parallelize  the  other  sections  of  the 
code  -  generating  the  matrix,  computing  the  Laplacian  at  irregular  mesh 
points,  and  applying  the  fast  Poisson  solver.  These  sections  were  also  easily 
parallelized,  and  the  use  of  a  parallel  DO  loop  facility,  called  DOALL,  on  the 
ultracomputer  allowed  efficient  parallelization  of  the  entire  code  with  only 
minor  modifications  to  the  original  serial  code.  DOALL  prespawns  tasks 
and  assigns  loop  indices  to  processors  dynamically  [1].  Different  elements 
of  the  matrix  were  generated  in  parallel  and  the  Laplacian  was  computed 
at  irregular  mesh  points  in  parallel.  The  2-D  fast  Poisson  solver  consists  of 
1-D  FFT's  which  were  also  executed  in  parallel. 

Figure  5  shows  the  time  and  speedup  on  1  to  8  processors  for  the  above 
problem.  Using  8  processors,  an  overall  speedup  of  about  a  factor  of  6  was 
obtained,  with  the  biconjugate  gradient  routine  obtaining  about  a  factor  of 
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7  speedup  over  its  time  on  1  processor.  The  time  for  the  parallel  code  run  on 
1  processor  was  about  5%  slower  than  the  time  for  the  serial  code.  A  factor 
of  7  speedup  on  the  ultracomputer  appears  to  be  near  the  best  attainable, 
due  to  bus  traffic. 

On  the  Cray  X-MP,  we  measured  the  difference  between  the  time  re- 
quired for  the  matrix  generation  only,  when  vectorized  and  unvectorized. 
For  this  section  of  the  code,  vectorization  resulted  in  a  factor  of  9  speedup, 
and  its  effect  on  the  other  sections  appears  to  be  similar. 


Acknowledgment.  The  authors  wish  to  thank  the  referees  for  many  helpful  com- 
ments. 
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